AITopics | thompson sampling algorithm

Collaborating Authors

thompson sampling algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Neural Information Processing SystemsDec-25-2025, 19:02:55 GMT

We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family, which covers many common reward distributions including Bernoulli, Gaussian, Gamma, Exponential, etc. We propose a Thompson sampling algorithm, termed ExpTS, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm. We provide a tight regret analysis for ExpTS, which simultaneously yields both the finite-time regret bound as well as the asymptotic regret bound. In particular, for a $K$-armed bandit with exponential family rewards, ExpTS over a horizon $T$ is sub-UCB (a strong criterion for the finite-time regret that is problem-dependent), minimax optimal up to a factor $\sqrt{\log K}$, and asymptotically optimal, for exponential family rewards. Moreover, we propose ExpTS$^+$, by adding a greedy exploitation step in addition to the sampling distribution used in ExpTS, to avoid the over-estimation of sub-optimal arms. ExpTS$^+$ is an anytime bandit algorithm and achieves the minimax optimality and asymptotic optimality simultaneously for exponential family reward distributions. Our proof techniques are general and conceptually simple and can be easily applied to analyze standard Thompson sampling with specific reward distributions.

finite-time regret, reward distribution, thompson sampling algorithm, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.63)
Information Technology > Data Science > Data Mining > Big Data (0.40)

Add feedback

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Neural Information Processing SystemsJan-19-2025, 08:09:22 GMT

We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family, which covers many common reward distributions including Bernoulli, Gaussian, Gamma, Exponential, etc. We propose a Thompson sampling algorithm, termed ExpTS, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm. We provide a tight regret analysis for ExpTS, which simultaneously yields both the finite-time regret bound as well as the asymptotic regret bound. In particular, for a K -armed bandit with exponential family rewards, ExpTS over a horizon T is sub-UCB (a strong criterion for the finite-time regret that is problem-dependent), minimax optimal up to a factor \sqrt{\log K}, and asymptotically optimal, for exponential family rewards. Moreover, we propose ExpTS, by adding a greedy exploitation step in addition to the sampling distribution used in ExpTS, to avoid the over-estimation of sub-optimal arms.

exponential family multi-armed bandit, reward distribution, thompson sampling algorithm, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.54)

Add feedback

Thompson Sampling Itself is Differentially Private

Ou, Tingting, Medina, Marco Avella, Cummings, Rachel

arXiv.org Machine LearningJul-20-2024

In this work we first show that the classical Thompson sampling algorithm for multi-arm bandits is differentially private as-is, without any modification. We provide per-round privacy guarantees as a function of problem parameters and show composition over $T$ rounds; since the algorithm is unchanged, existing $O(\sqrt{NT\log N})$ regret bounds still hold and there is no loss in performance due to privacy. We then show that simple modifications -- such as pre-pulling all arms a fixed number of times, increasing the sampling variance -- can provide tighter privacy guarantees. We again provide privacy guarantees that now depend on the new parameters introduced in the modification, which allows the analyst to tune the privacy guarantee as desired. We also provide a novel regret analysis for this new algorithm, and show how the new parameters also impact expected regret. Finally, we empirically validate and illustrate our theoretical findings in two parameter regimes and demonstrate that tuning the new parameters substantially improve the privacy-regret tradeoff.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2407.14879

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Security & Privacy (0.92)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Jin, Tianyuan, Xu, Pan, Xiao, Xiaokui, Anandkumar, Anima

arXiv.org Machine LearningJun-7-2022

artificial intelligence, big data, data mining, (3 more...)

arXiv.org Machine Learning

2206.0352

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.69)

Add feedback

Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning

Zhang, Tong

arXiv.org Machine LearningOct-2-2021

Thompson Sampling has been widely used for contextual bandit problems due to the flexibility of its modeling power. However, a general theory for this class of methods in the frequentist setting is still lacking. In this paper, we present a theoretical analysis of Thompson Sampling, with a focus on frequentist regret bounds. In this setting, we show that the standard Thompson Sampling is not aggressive enough in exploring new actions, leading to suboptimality in some pessimistic situations. A simple modification called Feel-Good Thompson Sampling, which favors high reward models more aggressively than the standard Thompson Sampling, is proposed to remedy this problem. We show that the theoretical framework can be used to derive Bayesian regret bounds for standard Thompson Sampling, and frequentist regret bounds for Feel-Good Thompson Sampling. It is shown that in both cases, we can reduce the bandit regret problem to online least squares regression estimation. For the frequentist analysis, the online least squares regression bound can be directly obtained using online aggregation techniques which have been well studied. The resulting bandit regret bound matches the minimax lower bound in the finite action case. Moreover, the analysis can be generalized to handle a class of linearly embeddable contextual bandit problems (which generalizes the popular linear contextual bandit model). The obtained result again matches the minimax lower bound. Finally we illustrate that the analysis can be extended to handle some MDP problems.

bandit, sampling, thompson sampling, (15 more...)

arXiv.org Machine Learning

2110.00871

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)
(2 more...)

Add feedback

Distributed Thompson Sampling

Dong, Jing, Li, Tan, Ren, Shaolei, Song, Linqi

arXiv.org Artificial IntelligenceDec-3-2020

We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimized the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. However, with agent's ability to communicate, we note that communication may further reduce the upper bound of the regret for a distributed Thompson Sampling approach. To further improve the performance of distributed Thompson Sampling, we propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively. We analyse the algorithm under Bernoulli reward and derived a problem dependent upper bound on the cumulative regret.

agent, stage 1, thompson sampling, (13 more...)

arXiv.org Artificial Intelligence

2012.01789

Country:

North America > United States > Michigan (0.04)
North America > United States > California > Riverside County > Riverside (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.69)

Add feedback

Thompson Sampling Algorithms for Mean-Variance Bandits

Zhu, Qiuyu, Tan, Vincent Y. F.

arXiv.org Machine LearningFeb-1-2020

The multi-armed bandit (MAB) problem is a classical learning task that exemplifies the exploration-exploitation tradeoff. However, standard formulations do not take into account risk. In online decision making systems, risk is a primary concern. In this regard, the mean-variance risk measure is one of the most common objective functions. Existing algorithms for mean-variance optimization in the context of MAB problems have unrealistic assumptions on the reward distributions. We develop Thompson Sampling-style algorithms for mean-variance MAB and provide comprehensive regret analyses for Gaussian and Bernoulli bandits with fewer assumptions. Our algorithms achieve the best known regret bounds for mean-variance MABs and also attain the information-theoretic bounds in some parameter regimes. Empirical simulations show that our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.

algorithm, mv 1, thompson sampling algorithm, (12 more...)

arXiv.org Machine Learning

2002.00232

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Add feedback

Thompson Sampling for Contextual Bandits with Linear Payoffs

Agrawal, Shipra, Goyal, Navin

arXiv.org Machine LearningFeb-3-2014

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the state-of-the-art methods. However, many questions regarding its theoretical performance remained open. In this paper, we design and analyze a generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary. This is among the most important and widely studied versions of the contextual bandits problem. We provide the first theoretical guarantees for the contextual version of Thompson Sampling. We prove a high probability regret bound of $\tilde{O}(d^{3/2}\sqrt{T})$ (or $\tilde{O}(d\sqrt{T \log(N)})$), which is the best regret bound achieved by any computationally efficient algorithm available for this problem in the current literature, and is within a factor of $\sqrt{d}$ (or $\sqrt{\log(N)}$) of the information-theoretic lower bound for this problem.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Machine Learning

1209.3352

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Further Optimal Regret Bounds for Thompson Sampling

Agrawal, Shipra, Goyal, Navin

arXiv.org Machine LearningSep-14-2012

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the state of the art methods. In this paper, we provide a novel regret analysis for Thompson Sampling that simultaneously proves both the optimal problem-dependent bound of $(1+\epsilon)\sum_i \frac{\ln T}{\Delta_i}+O(\frac{N}{\epsilon^2})$ and the first near-optimal problem-independent bound of $O(\sqrt{NT\ln T})$ on the expected regret of this algorithm. Our near-optimal problem-independent bound solves a COLT 2012 open problem of Chapelle and Li. The optimal problem-dependent regret bound for this problem was first proven recently by Kaufmann et al. [ALT 2012]. Our novel martingale-based analysis techniques are conceptually simple, easily extend to distributions other than the Beta distribution, and also extend to the more general contextual bandits setting [Manuscript, Agrawal and Goyal, 2012].

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1209.3353

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback